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This listing of claims will replace all prior versions, and listings, of claims in the application. 
Listing of Claims: 

1 . (currently amended) A method for detecting similar objects in a collection of such 
objects, comprising, for each of two objects: 

modifying a previous method for detecting similar objects so that memory 
requirements are reduced while avoiding false detections approximately as well as in the 
previous method, wherein the modifying comprises: 

combining a number of samples of features into each of a total number of 
supersamples, wherein the number of samples is reduced from a number of samples used in 
the previous method; 

compressing each of the total number of supersamples to a number of bits of 
precision, wherein the number of bits of precision is reduced from a number of bits of 
precision used in the previous metho d, and wherein the number of bits of precision is reduced 
by generating supersamples that do not include by removing at least one of tho most precise 
bite least significant bit of the supersamples that wore used in the previous method; and 

requiring a number of matching supersamples out of the total number of 
supersamples in order to conclude that the two objects are sufficiently similar, wherein the 
number of matching supersamples is greater than a number of matching supersamples 
required in the previous method. 

2. (original) The method of claim 1 wherein requiring the number of matching 
supersamples comprises requiring all but one of the total number of supersamples to match. 

3. (original) The method of claim 1 wherein requiring the number of matching 
supersamples comprises requiring all but two of the total number of supersamples to match. 

4. (original) The method of claim 1 wherein requiring the number of matching 
supersamples comprises requiring all supersamples to match. 
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5. (original) The method of claim 1 wherein combining the number of samples into 
each of the total number of supersamples comprises combining four samples into each of the 
total number of supersamples, wherein the number of samples used in the previous method is 
14. 

6. (previously presented) The method of claim 5 wherein: 

compressing each supersample to the first number of bits of precision comprises 
recording each supersample to 16 bits of precision, wherein the second number of bits of 
precision used in the previous method is 64; and 

requiring the number of matching supersamples comprises requiring four 
supersamples of six to match, wherein the number of matching supersamples required in the 
previous method is two supersamples of six. 

7. (original) The method of claim 5 wherein requiring the number of matching 
supersamples comprises requiring five supersamples of seven to match , wherein the number 
of matching supersamples required in the previous method is two supersamples of six. 

8. (original) The method of claim 1 wherein the objects are documents, and the 
method is used in association with a search engine query service to determine clusters of 
query results that are near-duplicate documents. 

9. (original) The method of claim 8, further comprising selecting a single document 
in each cluster to report. 

10. (original) The method of claim 9 wherein selecting the single document is by 
way of a ranking function. 

1 1 . (currently amended) A method for determining groups of near-duplicate items in a 
search engine query result, comprising, for each of two items being compared: 

combining four samples of features into each of six supersamples; 
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compressing each supersample to 16 bits of precision by removing each bit of the 
supersample other than recording the 1 6 least precise most significant bits of the 
supersample; and 

requiring four of the six supersamples to match. 

12. (original) The method of claim 11, further comprising selecting a single 
document in each cluster to report. 

13. (original) The method of claim 12 wherein selecting the single document is by 
way of a ranking function. 

14. (currently amended) A method for determining groups of near-duplicate items in a 
search engine query result, comprising, for each of two items being compared : 

combining four samples of features into each of seven supersamples; 

compressing each supersample to 16 bits of precision by removing each bit of the 
supersample other than recording the 1 6 least precise most significant bits of the 
supersample; and 

requiring five of the seven supersamples to match. 

15. (original) The method of claim 14, further comprising selecting a single 
document in each cluster to report. 

16. (original) The method of claim 15 wherein selecting the single document is by 
way of a ranking function. 

17. (currently amended) A computer-readable medium embodying machine instructions 
implementing a current method for detecting similar objects in a collection of such objects, 
wherein the current method comprises modification of a previous method for detecting 
similar objects so that memory requirements are reduced while avoiding false detections 
approximately as well as in the previous method, the current method comprising: 



Page 4 of 10 



DOCKET NO.: 307238.01 / MSFT-5031 PATENT 
Application No.: 1 0/805,805 REPLY FILED UNDER EXPEDITED 

Office Action Dated: April 10, 2007 PROCEDURE PURSUANT TO 

37 CFR§ 1.116 

combining a number of samples of features into each of a total number of 
supersamples, wherein the number of samples is reduced from a number of samples used in 
the previous method; 

compressing each of the total number of supersamples to a number of bits of 
precision, wherein the number of bits of precision is reduced from a number of bits of 
precision used in the previous method , wherein the number of bits of precision is reduced by 
generating supersamples that do not include by removing at least one of the most precise bits 
least significant bit that were used in the previous method; and 

requiring a number of matching supersamples in order to conclude that the two 
objects are sufficiently similar, wherein the number of matching supersamples is greater than 
a number of matching supersamples required in the previous method. 

18. (original) The computer-readable medium of claim 17 wherein requiring the 
number of matching supersamples comprises requiring all but one of the total number of 
supersamples to match. 

19. (original) The computer-readable medium of claim 17 wherein requiring the 
number of matching supersamples comprises requiring all but two of the total number of 
supersamples to match. 

20. (original) The computer-readable medium of claim 17 wherein requiring the 
number of matching supersamples comprises requiring all supersamples to match. 

21 . (currently amended) A computer-readable medium embodying machine instructions 
implementing a method for determining groups of near-duplicate items in a search engine 
query result, comprising, for each of two items being compared: 

combining four samples of features into each of six supersamples; 

compressing each supersample to 16 bits of precision by removing each bit of the 
supersample other than recording the 1 6 least precise most significant bits of the 
supersample; and 

requiring four of the six supersamples to match. 

Page 5 of 10 



DOCKETNO.: 307238.01 / MSFT-5031 
Application No.: 10/805,805 
Office Action Dated: April 10, 2007 



PATENT 

REPLY FILED UNDER EXPEDITED 
PROCEDURE PURSUANT TO 
37 CFR§ 1.116 



22. (currently amended) A computer-readable medium embodying machine instructions 
implementing a method for determining groups of near-duplicate items in a search engine 
query result, comprising, for each of two items being compared: 

combining four samples of features into each of seven supersamples; 

compressing each supersample to 16 bits of precision by removing each bit of the 
supersample other than recording the 1 6 least precise most significant bits of the 
supersample; and 



requiring five of the seven supersamples to match. 
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